Dysregulated gene subnetworks in breast invasive carcinoma reveal novel tumor suppressor genes

Breast invasive carcinoma (BRCA) is the most malignant and leading cause of death in women. Global efforts are ongoing for improvement in early detection, prevention, and treatment. In this milieu, a comprehensive analysis of RNA-sequencing data of 1097 BRCA samples and 114 normal adjacent tissues is done to identify dysregulated genes in major molecular classes of BRCA in various clinical stages. Significantly enriched pathways in distinct molecular classes of BRCA have been identified. Pathways such as interferon signaling, tryptophan degradation, granulocyte adhesion & diapedesis, and catecholamine biosynthesis were found to be significantly enriched in Estrogen/Progesterone Receptor positive/Human Epidermal Growth Factor Receptor 2 negative, pathways such as RAR activation, adipogenesis, the role of JAK1/2 in interferon signaling, TGF-β and STAT3 signaling intricated in Estrogen/Progesterone Receptor negative/Human Epidermal Growth Factor Receptor 2 positive and pathways as IL-1/IL-8, TNFR1/TNFR2, TWEAK, and relaxin signaling were found in triple-negative breast cancer. The dysregulated genes were clustered based on their mutation frequency which revealed nine mutated clusters, some of which were well characterized in cancer while others were less characterized. Each cluster was analyzed in detail which led to the identification of NLGN3, MAML2, TTN, SYNE1, ANK2 as candidate genes in BRCA. They are central hubs in the protein–protein-interaction network, indicating their important regulatory roles. Experimentally, the Real-Time Quantitative Reverse Transcription PCR and western blot confirmed our computational predictions in cell lines. Further, immunohistochemistry corroborated the results in ~ 100 tissue samples. We could experimentally show that the NLGN3 & ANK2 have tumor-suppressor roles in BRCA as shown by cell viability assay, transwell migration, colony forming and wound healing assay. The cell viability and migration was found to be significantly reduced in MCF7 and MDA-MB-231 cell lines in which the selected genes were over-expressed as compared to control cell lines. The wound healing assay also demonstrated a significant decrease in wound closure at 12 h and 24 h time intervals in MCF7 & MDA-MB-231 cells. These findings established the tumor suppressor roles of NLGN3 & ANK2 in BRCA. This will have important ramifications for the therapeutics discovery against BRCA.

Table 1.Summary of dysregulated genes in early and late stage in each of the three molecular classes.The table shows significantly dysregulated genes with a log2 fold change greater than or less than 1.In early stage, there was a greater number of dysregulated genes as compared to late stage.The log2 fold change varied from 1-13 in early stages and 1-20 in late stages.In the case of late-stage ER/PR-/HER-2+, the analysis could not be done as there was no sample of NAT to compare with.The larger number of genes were found to be downregulated in all classes.www.nature.com/scientificreports/Early stage ER/PR+/HER-2− Three clusters were obtained, out of which one is significantly mutated (p-value < 0.05) containing seven genes viz.SYT2, SYT6, SYT9, SYT13, NRXN1, NLGN1 and NLGL3 (Fig. 1a).Among them, 2 were upregulated and 5 were downregulated with average log 2 fold change (FC) as 2.21 Table 4 The pathway analysis indicated that the cluster may be involved in TR/RXR activation pathway (supplementary fig.S2), which activates PI3K pathway and is involved in cell survival and growth 5 .
Early stage TNBC Two mutated subnetworks were identified and the most mutated composed of 8 genes viz.TTN, TTC3, MYOM1, MYOM2, SYNE1, NEO1, NTN4 and UNC5B (average log 2 FC = 2.17).All genes in the subnetwork were significantly downregulated except UNC5B.(Fig. 1c, Table 6).TTN and SYNE1 are known driver genes in BRCA and were significantly mutated as compared to others 6 .Pathway analysis showed enrichment of netrin signaling, RhoA signaling, and axonal guidance signaling pathways (supplementary fig.S4a, S4b, S4c).Axonal guidance signaling play a crucial mechanism in tumor suppression and oncogenesis and can be promising targets for therapeutics 7,8 .www.nature.com/scientificreports/Late stage TNBC Two mutated clusters were found and most significant had 14 genes (Fig. 1d, Table 7) (average log 2 FC 2.59).It is important to note that TNS1, DMD, ANK2 6 , and AHNAK 9 are known driver genes in BRCA.The cluster was found to be involved in galactose degradation, colanic acid building blocks biosynthesis, and nNOS signaling pathway (supplementary fig.S5a, S5b, S5c).It has been reported that the knockdown of galactokinase controls the growth of hepatoma cells in vitro 10 and galactose metabolism pathways are dysregulated in breast cancer 11 .
The cluster from all classes contains total 35 genes.It was interesting to see many genes in the identified clusters which are not yet reported to be directly involved in BRCA.Initial analysis indicated that these genes have reported interactions with known cancer genes.Some of them are regulators of genes having known role in BRCA.Additionally, many of them were also found to be dysregulated across other cancers 12 (Supplementary fig.S6).Therefore, it is reasonable to assume that they may also play some role in BRCA.The genes were further examined for mutations e.g.inframe, truncating, missense or others using cBioPortal 13 .A large number of mutations were found in TTN, SYNE1, ANK2, and NLGN3.In fact, few genes showed mutation frequency greater than BRCA1/BRCA2.

Detailed mutational analysis of significant clusters
Early stage ER/PR+/HER-2−: NLGN3, NRNX1, NLGN1 cluster In NLGN1 and NLGN3, carboxylesterase domain contains the majority of the mutations whereas in NRXN1, laminin G domain contains the majority of mutations.The mutations in the laminin domain may contribute to EMT by affecting cell-to-cell adhesion by binding to integrin α6β4 (β4 subunit) expressed by epithelial cells.This promotes tumor cell survival, invasion, and migration via signaling through RAC and PI3K pathway 14 .In other members (SYT2, SYT6, SYT9 and SYT13), C2 domain was affected.The C2 domains of SYT family are involved in exocytosis.The alterations of exocytic proteins are known to be associated with malignant transformations 15 .The altered exocytic networks are involved in the acquisition of premetastatic traits 16 .Therefore, the identification of such a cluster is no surprise.The details of the mutations in each gene are given in Table 8 and supplementary Table S4a.Such mutations which can impact protein function have been plotted against tumor samples (Fig. 2a).Among all genes in this subnetwork, the NLGN3 showed a higher mutation rate, equal to BRCA1.NLGN3 as a candidate gene in early stage ER/PR+/HER-2−.The fact that the NLGN3 is dysregulated and highly mutated in a number of samples makes it an interesting candidate.NLGN3 is a protein-coding gene that is reported to be involved in Asperger's syndrome and autism X-linked 1.The gene was found to be significantly downregulated (log 2_ FC = − 1.06) (supplementary fig.S7a) and interacts with other genes viz.DLG4, GABRA1, OR1F12, CYFIP2, GABRA2, AGAP2, NLGN4X, UTP15, NLGN4Y, TMEM39A, TYK2, RECQL4, PIK3CB, PIK3CA, NLGN1 17 .Many of them are well-known cancer-associated genes.The NLGN3 was also found to have a role in cell-cell adhesion and cell differentiation 18 .Additionally, it regulates GRIN2B 19 , which is reported to play a significant role in cancer development and progression.In NLGN3, total of sixteen mutations were found in different domains; among them ten were predicted deleterious.Figure 3a shows that carboxylesterase family, neurolgin-3, and alpha-beta hydrolase domain got significantly enriched in mutations.Therefore, considering the above reasons, the gene NLGN3 can be a good candidate gene.The mutations in the EGF-like domain lead to the inactivation of NOTCH 20 .Other members were found to be less mutated and clustered due to their association with NOTCH.The details of different mutations in each of the gene is given in Table 9 and supplementary table S4b.MAML2 harbored total 12 mutations, out of which, 4 were predicted to be deleterious.Figure 2b shows the percent mutation rate for each member.Although percent mutation rate for MAML2 is less, it interacts with NOTCH1 and NOTCH2.Its precise role in BRCA pathogenesis is not yet known.
MAML2 as a candidate gene in early stage ER/PR−/HER-2+.MAML2, a mastermind like transcriptional coactivator 2, encodes gene of mastermind-like family of proteins.The gene is reported to be involved in Notch signaling and pre-NOTCH expression and processing pathways.The N-terminal domain binds CREBBP/CBP 21 and CDK8 22 .It regulates various genes viz.ID2, NR4A2, FOS, STC1, NR4A1, CREM, ATF3, MAF, CTH, KRT17, PTN, CRYAB, THBS1, CGB3, HAS2 17 , many of them are known for their association to BRCA pathogenesis .In BRCA, the neurogenic mastermind like N-terminal domain in MAML2 was found to be affected by many deleterious mutations (Fig. 3b).It was interesting to find MAML2 to be downregulated in early stage ER/ PR−/HER-2+ samples (log 2 _FC = − 2.14) (supplementary fig.S7b).Being co-activator of NOTCH protein, which is having a dual role as an oncogene and tumor-suppressor 33 , MAML2 might be having tumor-suppressor role in BRCA or might be leading to oncogenesis through some other pathways, but the molecular mechanism is not yet clear.Therefore, it would be interesting to understand the role of MAML2 downregulation in BRCA samples.

Early stage TNBC: TTN and interactors
The TTN was found to be the most mutated gene followed by SYNE1.The other members of networks were less mutated with somatic mutation frequency (SMF) < = 1%.In TTN, the total mutations were 438, (366-missense, 72-truncating and 4-inframe).In SYNE1, the total mutations were found to be 105, (79-missense, 20-truncating, Table 10 and supplementary table S4c).The percent mutation rate for TTN and SYNE1 is significantly high which may make them ideal candidate genes in BRCA (Fig. 2c).
It was found to have 22 deleterious mutations out of 79 missense mutations affecting CH and spectrin repeat domain (Fig. 3d).The spectrin-based domain protects epithelial cells from mechanical stress and is involved in the homeostasis of water and salt.The mutation in the spectrin domain may lead to its loss of function which results in defective TGF-B signaling and cell cycle deregulation.The loss of this domain is a marker of metastatic cancer cells 43 .The plot shows spectrin/alpha-actinin and spectrin repeat domain got significantly enriched in mutations.
Late stage TNBC: DMD, ANK1, ANK2 and interactors DMD and ANK2 were found to be highly mutated.DMD is a well-known gene 44 but there are no reports for the role of ANK2 in BRCA.ANK2 (SMF = 2.0%) directly interacts with DMD (SMF = 3.6%) and most mutations in ANK2 and DMD were found to be missense.ANK1 and AHNAK had SMF as 1.5% and 1.4% respectively and other members of the cluster were less mutated (SMF < 1%).Among all, six were well-known in BRCA while the rest (NBPF3, ANK1, NFASC, CHL1, ANK2, CEP120, GAK2, and CCDC18) are not reported yet.Out of these, we found ANK2 as the most mutated (Table 11 and supplementary table S4d) with a mutation rate higher than BRCA/BRCA2.(Fig. 2d).
ANK2 as candidate gene in late stage TNBC.It is protein-coding gene, encoding ankyrin-family of proteins.The binding site in ankyrins facilitates binding of integral membrane proteins with spectrin-actin based membrane cytoskeletal.This linkage maintains the integrity of plasma membrane and anchors specific ion channels.They play role in cellular activities e.g.motility and proliferation 45 .The gene is significantly downregulated (log 2 FC = − 2.2) (supplementary fig.S7e) in TNBC harboring 40 mutations, out of which 32 were missense.The ankyrin repeat domain and ZU5 domains were mainly affected by missense mutations (Fig. 3e).Ankyrin repeat domains are present in diverse proteins and act as platform for interactions with other proteins.The ankyrin repeat domain binds to miRNA, causing a loss in their function and drives proliferation in renal cancer cells 46 .
Additionally, the overexpression of ankyrin repeat domain is related to the drug resistance in lung cancer 47 .The ZU5 domain is involved in the induction of apoptosis and binds to the NRAGE domain of UNC5H.The ANK2 regulates ITPR1, ITPR, RYR2, MAPK1, ERK1/2, SPTBN1, focal adhesion kinase, SCN8A and ANK3.The ANK2

Experimental validation
Validation of predictions is a quintessential part of scientific research.In the current studies to validate the predictions, one gene from each of the subnetworks was picked, thus four genes viz.NLGN3, ANK2, MAML2 and SYNE1 were selected.The transcript level of selected genes in different breast cancer cell lines MCF7, T47D, MDA-MB-231 was determined.

qRT-PCR analysis
The amplification of GAPDH was completed earlier than other candidate genes as revealed by CT value, indicating that the genes are down-regulated in cancer cell lines.The expression of NLGN3 was 0.12, 0.15, 0.30-fold downregulated in MCF7, T47D, and MDA-MB-231 respectively.The expression of MAML2 was 0.16, 0.33, and 0.29-fold (downregulated) in MCF7, T47D, and MDA-MB-231 respectively.The expression of SYNE1 was 0.08 and 0.17-fold (downregulated) in MCF7 and MDA-MB-231 respectively and 1.44-fold upregulated in T47D.

Western blot analysis
The blot images of ANK2, SYNE1 and NLGN3 are represented in Fig. 4e-g respectively.The unpaired student's t-test showed the decrease in protein expression of ANK2, SYNE1 and NLGN3 was significant in the cancer cell lines MDA-MB-231 (p-value < 0.05) when compared with control MCF10A and decrease in protein expression www.nature.com/scientificreports/ of SYNE1 was significant in MCF7 breast cancer cell line (Fig. 4h-j).The full length western blots are given in Supplementary Information file.

SYNE1:
The tissue expression of SYNE1 is decreased in each of the four categories.The mean expression was found to be 10.41, 8.86, 7.25, 6.33 and 11.83 in early stage ER/PR+/HER-2−, late stage ER/PR+/HER-2−, early stage TNBC, late stage TNBC and NATs respectively.The Mann-Whitney test revealed a significant decrease in SYNE1 expression in early stage TNBC (p-value = 0.0410) (Fig. 5b, e).The tissue expression of SYNE1 is also decreased in grade 2 and 3 tumors but was not statistically significant (Fig. 5h, k).

Elucidation of tumor suppressor role of ANK2 and NLGN3 using cell viability, colony forming, transwell migration and wound healing assay
To elucidate the involvement of ANK2 and NLGN3 in migratory and proliferative capacity of MCF7 and MDA-MB-231 cells, we performed transwell migration and cell proliferation assay and results showed ANK2 and NLGN3 overexpression decreased migration and proliferation of these cells.Next, colonogenic assay was performed to assess the impact of ANK2 and NLGN3 on colony formation, and the result showed small and a fewer numbers of colonies in both MCF7 and MDA-MB-231 cells overexpressing ANK2 and NLGN3 compared to control cells.Further, we investigated the role of ANK2 and NLGN3 in wound healing and found that ANK2 www.nature.com/scientificreports/and NLGN3 overexpression impeded the migration of MCF7 and MDA-MB-231 cells (Fig. 6).Collectively, the in vitro findings suggest the role of ANK2 and NLGN3 in breast tumorigenesis suggesting it may be a potential drug target for breast cancer therapeutics.

Proposed mechanism for the role of nesprin1 (SYNE1) protein in breast invasive carcinoma
The pathway describing the role of SYNE1 (Spectrin Repeat Containing Nuclear Envelope Protein 1) in the progression of breast invasive carcinoma has been proposed with the help of our bioinformatics analysis and detailed literature survey.SYNE is a protein that is found in the outer nuclear membrane which is comprised of N-terminal actin binding domain (ABD), a rod domain with variable spectrin repeats (SRs), and a KASH domain (KASH peptide and ONM (outer nuclear membrane) transmembrane domain) at C-terminal.At the ABD domain, SYNE 1/2 binds with actin and at the KASH peptide, it binds with SUN domain of SUN1/2 protein, forming a LINC (Linker of Nucleoskeleton and Cytoskeleton) complex.The N-terminal domain of SUN protein binds to nuclear lamina (LaminA).The LINC complex therefore links actin present in cytoplasm to nuclear lamina underlying the INM (inner nuclear membrane) and performs different cellular functions as (A) facilitates mechano-transduction, (B) formation of nuclear pore complex, (C) attaching centrosome to ONM D) provides stability and strength to cells (Fig. 7A) 48,49 .
In the case of BRCA, there is a significant downregulation of SYNE1 protein as revealed by our analysis.This indicates that SYNE1 is less available to bind with SUN protein resulting in downregulation of LINC complex.Once the LINC complex is downregulated, the LaminA becomes free and its expression is upregulated.The LaminA has been reported to influence RhoA and its effector activity and acts as a negative regulator of ROCK2 protein 50,51 .This suggests that as the LaminA is upregulated, the expression of ROCK2 gets down-regulated due to a negative feedback mechanism.The ROCK phosphorylates and activates PTEN which is a tumor suppressor gene [52][53][54] .Therefore, when ROCK2 gets downregulated, PTEN is not able to get phosphorylated and remains inactive.PTEN is a phosphatase protein that dephosphorylates phosphatidylinositol-3,4,5-triphosphate (PIP3), which is the product of PI3K into phosphatidylinositol 4,5-bisphosphate (PIP2).The loss of PTEN function leads to the accumulation of PIP3 and enhanced activation of its subsequent downstream effectors as PDK1, AKT/ PKB and CDK-cyclin.This eventually results in cell cycle progression, proliferation, cell survival and migration (Fig. 7B).

Discussion
Understanding the molecular subtypes of breast cancer is important for predicting prognosis, guiding treatment decisions, and developing targeted therapies to improve outcomes for patients with breast cancer.Identification of dysregulated genes in different molecular subtypes can provide valuable insights into the underlying biological mechanisms driving the disease.It can also help in the development of personalized treatment approaches tailored to the molecular characteristics of the tumor.Among the three molecular classes, TNBC is the most aggressive one.The five-year survival rate is lower for TNBC than other subtypes, perhaps due to lack of molecularly targeted therapies.Therefore, efforts are being done to develop new diagnostic and prognostic approaches.For example a ten gene expression signature has been reported to be associated with TNBC, in particular for Mexican patients 55 .In TNBC different dysregulated noncoding RNAs were analyzed 56 and molecular pathways were identified 57 .The overall and disease-free survival was related to certain gene expressions in TNBC 58,59 .
The genomics differences between African and Caucasian American women were studied and 20 genes were identified that segregated two classes 60 .ITGA11 and Jab1 were identified as biomarker for breast cancer 61 .The mutations in important loci as BRCA1, BRCA2, PTEN, ATM, TP53, CHEK2, PPM1D, CDH1, MLH1, MRE11, MSH2, MSH6, MUTYH, NBN, PMS1, PMS2, BRIP1, RAD50, RAD51C, STK11 and BARD1 were related with risk of breast cancer 62 .The relation between somatic mutations and prognosis was determined and mutations in PIK3R1 and DDR1 were found to be associated with poor outcomes in HR+ breast cancer 63 .This is the first study to identify subnetworks of genes in three molecular classes (ER/PR+/HER-2−, ER/PR-/ HER-2+, and TNBC) in different clinical stages based on genetic dysregulation and mutations.
This study reports five genes viz.neuroligin3 (NLGN3), mastermind-like transcriptional coactivator 2 (MAML2), titin (TTN), ankyrin2 (ANK2) and spectrin repeat containing nuclear envelop protein 1 (SYNE1) to be significantly associated with distinct molecular classes of BRCA (ER/PR+/HER-2−, ER/PR-/HER-2+ and TNBC).The genes are involved in important processes such as chemotaxis and axon guidance, notch binding, cell adhesion molecule binding etc.They are central genes in the protein-protein-interaction (PPI) network indicating they can have important regulatory roles.The identified genes were found to be involved in different biological pathways such as TR/RXR activation pathway, Notch signaling pathway, regulation of EMT, Netrin signaling, RhoA signaling, axonal guidance signaling pathways, and galactose degradation I pathway.These pathways have important roles in cellular processes such as proliferation, differentiation, migration, survival and predispose oncogenesis.
In late stages, calcium signaling, cholesterol biosynthesis, FXR/RXR activation, and AMPK signaling pathways were (p-value < 0.010) found to be associated with ER/PR/+/HER-2−.Whereas E1F2 signaling, mTOR signaling, renin-angiotensin, SAPK/JNK signaling and IL-10 signaling pathways (p-value < 0.014) were found enriched in TNBC.One of the recent studies showed that genetic alterations in SYNE1 are found in 10% of gynecologic malignancies and 5% of epithelial ovarian cancers 64 .Another study demonstrated that NLGN3 endorses glioma progression by upregulating LYN and ADAM10 activity, which in turn promotes NLGN3 cleavage to form a positive feedback loop 65 .In a study by Cho et al. 66 ANK2 was hypermethylated in canine mammary tumor and was also highlighted as potential tissue biomarker.Additionally, ANK2 showed significant hypermethylation in the plasma cfDNA of canine mammary tumors, indicating it could be a possible liquid biopsy biomarker as well.
Our findings were further validated using qRT-PCR, western blotting and immunohistochemistry which show decreased expression of NLGN3, MAML2, ANK2, and SYNE1 in breast cancer cell lines as MCF7 and MDA-MB-231 as compared to control cell line MCF10A.The tumor suppressor role of the selected genes ANK2 and NLGN3 was elucidated by cell viability assay, transwell migration, colony-forming and wound healing assay in MCF7, MDA-MB-231 and MCF10A cell lines.ANK2 and NLGN3 overexpression showed a reduction in migration and proliferation of MCF7 and MDA-MB-231 cells.
The present study revealed that the genes viz.NLGN3, MAML2, TTN, SYNE1, and ANK2 are significantly correlated with BRCA.They interact with numerous other genes associated with cell proliferation, survival, migration, and metastasis.The genes initiate and stimulate the proliferation of cancer cells by impeding the functions of downstream genes and disrupting normal cellular processes.
The results suggest that candidate genes may serve as useful biomarkers for different molecular classes of BRCA.The subnetworks containing gene components can be targeted for developing novel therapeutics for BRCA.

Material and methods
The different molecular classes of BRCA have been analyzed for genes that are differentially expressed in early and late stages, dysregulated biological pathways, and cancer progression using integrated computational analysis.The flowchart of the methodology is shown in Fig. 8.

Data collection
The transcript expression data for BRCA tissue and normal adjacent tissue (NAT) was obtained from cancer RNA-seq nexus (CRN) (Jian et al. 67 ).The data contained transcripts expression values (total 71,361 (coding and non-coding  12.However, due to the small sample size in each stage, we classified data into two groups i.e. early (I, II) and late (III, IV) stage for the purpose of analysis.The transcripts with significant p-value (< 0.05) were kept for further analysis.The mean expression of transcripts that belong to a gene was considered as the expression for that gene.

Differential expression study
The fold change was calculated using TPM values in tumor and NAT.The genes with a log 2 FC greater than or less than 1 and p-value < 0.05 were considered up or downregulated respectively 68 .The differential expression pattern was studied in the early and late stages in each of the three class.Further, due to non-availability of NAT sample in late ER/PR−/HER-2+ class, it was removed from the study.Finally, five classes (1) Early stage ER/PR+ / HER-2−.(2) late stage ER/PR+/HER-2−.(3) Early stage ER/PR−/HER-2+.(4) Early stage TNBC and (5) Late stage TNBC were analyzed in detail.The biological pathways in each class were also analyzed to check similarities and differences in terms of affected pathways in different classes.

Clustering of genes into subnetworks
The genes were clustered based on differential expression, protein-protein interaction (PPI) and mutational frequency using Hotnet2 algorithm.The hotnet2 is an algorithm to find out mutated clusters and overcomes limitations with existing algorithms as MEMo, HotNet etc.These clusters can have co-occurring mutations across cancer samples and their component genes may work together with upstream/downstream genes to drive biological pathway(s).The purpose was to uncover modules, pathways, and processes which are getting affected in different classes of BRCA.The mutation data for BRCA was collected from ICGC 66 and subnetworks were created using Hotnet2 algorithm (supplementary data 1).

Pathway analysis
Pathway analysis was performed using IPA (Ingenuity-Pathway-Analysis), RRID: SCR_008653, using p-value cutoff of 0.05 to uncover functional role of various modules (subnetworks).

Quantitative-real-time polymerase-chain-reaction (qRT-PCR)
Breast cancer cell lines MCF7, T47D and MDA-MB-231 were purchased from National Centre for Cell Sciences (NCCS), Pune, India and MCF10A was a kind gift from Dr. Annapoorni Rangarajan (IISC, Bangalore, India).The total RNA was isolated from MCF7, T47D, MDA MB-231 and MCF10A cells using Tri-reagent (Sigma-Aldrich).Reverse transcription PCR and qRT-PCR was performed using primers of NLGN3, MAML2, SYNE1 and ANK2.The detailed methodology and primer sequences are given in supplementary data 2. GAPDH was taken as an www.nature.com/scientificreports/internal control and ΔΔCT values were calculated.The statistical analysis was performed using GraphPad Prism (RRID: SCR_002798), which includes unpaired student's t-test for estimation of statistical significance.

Western blot analysis
The whole cell lysate of MCF7, MDA-MB-231 and MCF10A were prepared using RIPA buffer.The samples were run in 10% SDS-PAGE gel, transferred on PVDF membrane (Millipore) and blocked with 5% (w/v) non-fat milk (Sigma).Blots were then incubated with primary antibody (ANK2, SYNE1, NLGN3) overnight at 4 °C with gentle shaking.The membrane was then washed with 1× TBS-T and incubated with anti-rabbit or anti-mouse horseradish peroxidase-conjugated secondary antibody (1:5000 dilution) for 1 h.After washing, the blots were developed using Western Blot Chemiluminescence HRP Substrate (Takara Bio Inc.) in Chemidoc XRS+ molecular 228 imager (Bio-Rad, Hercules, CA, USA).The images were quantified using Image J software.The detailed methodology is given in supplementary data 3.

Immunohistochemistry
The tissue microarray slides (TMA) were purchased from US Biomax for each of the genes (NLGN3, MAML2, SYNE1 and ANK2, RRID: AB_1857707 and RRID: AB_2667965 Four slides were stained by anti-MAML2, anti-ANK2, anti-SYNE1 and anti-NLGN3 antibodies separately at 1:50 dilution and were further processed using the ABC system (Vector Laboratories, Bulingame, CA, USA) as described previously 67 .The images were captured under Leica microscope using LAS EZ software version 2.1.0at 40× magnification.The tissue spots were scored by pathologist in a blinded experiment.The score for the intensity of staining and percentage of positive cells were given as per the Allred scoring system.The staining intensity was scored as: 0 for negative, 1 for weak, 2 for intermediate and 3 for strong.The percentage positive cells were scored between 0 and 100%: 0 for no cells, 1 for < 1% cells, 2 for 1-10%, 3 for 11-33%, 4 for 34-66% and 5 for 67-100%.Quick score for each sample was calculated by multiplying the intensity score by score for percentage positive cells.The statistical analysis was done with quick score and graph was plotted.

Cell viability assay
Cell viability assay was performed to check the proliferation rate of MCF7 and MDA-MB-231 control upon NLGN3 and ANK2 overexpression.Control and overexpressed MCF7 and MDA-MB-231 cells were seeded at a density of 3 × 10 3 cells per well in 96-well plates.Cells were incubated for 48 h.After the stipulated time, 10 μl of 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) at a concentration of 5 mg/ml were added into each well and incubated for 4 h.After incubation, the media was removed and 100 μl of DMSO was added to dissolve the purple formazan crystals.The absorbance was taken at a wavelength of 570 nm in a Varioskan™ flash multimode reader (Thermo Scientific).Three independent sets of experiments were performed.The percent viability was calculated by the formula-%viability = A/A 0 where A 0 and A are the absorbances of the vehicle control and NLGN3/ANK2 overexpresses cells respectively.

Colony forming assay
For colony forming assay, briefly after transfecting of MCF7 and MDA-MB-231 cells with NLGN3 and ANK2, 0.6 × 10 3 control, NLGN3, and ANK2 overexpressed cells were seeded in triplicates in 60 mm-plates.The plates were kept at 37 °C, 5%CO2 chamber for 1 week to allow the growth of colonies (50 cells per colony).Then the cells were washed with 1× PBS, fixed with 10%(v/v) formalin, and then stained with 0.01%(w/v) crystal violet solution.Excess stain was removed by washing twice with 1XPBS.The plate was air-dried followed by imaging in Gel Doc™ XR + Imager (Bio-Rad).The stained cells were dissolved in 10% (v/v) acetic acid and the absorbance was quantified at 570 nm using Varioskan™ Flash Multimode Reader (Thermo Scientific).Colony formation rate was calculated by the formula-Colony forming rate = 100% × (experimental absorbance value/control absorbance value).

Transwell migration assay
Transwell migration assay was performed following manufacturer's protocol (BD Falcon, USA).)48 h posttransfection, control, NLGN3 and ANK2 overexpressed MCF7 and MDA-MB-231 cells were seeded at a density of 0.5 × 10 3 cells in upper chamber of 24 well trans-well system in 500 μl of DMEM media.Medium supplemented with 5% serum was used as a chemoattractant in the lower chamber.After 24 h the cells on both sides of the membrane were fixed with 10% formalin and stained with 0.01% crystal violet stain.The membrane was then washed with PBS and the cells attracted towards the serum were visualized under a light microscope and pictured (10×) under different field views.The number of migrated cells in control and overexpressed cells in 10 different fields were calculated using ImageJ software and the average value was represented in the graph.

Wound healing assay
1 × 10 4 MCF7 and MDA-MB-231 control, NLGN3 and ANK2 overexpressed cells were plated and grown up to 90% confluence in a 12-well plate (Falcon Becton Dickinson).Cells were then scratched with a sterile 200 μl pipette tip in each well.The cells were washed twice with 1× PBS and the image was captured such as cells at stage 1 that is 0 h.Images of the cells undergoing migration were then taken at different time points at a magnification of 10×.Quantitation of migrated cells was done by calculating the decrease in the area at all the observed time points with the help of ImageJ software.

Conclusion
Our integrated approach has enabled the identification of candidate genes in ER/PR+/HER-2−, ER/PR-/HER-2+, and TNBC in the early and late stages of BRCA.We found some interesting patterns e.g.genes which were upregulated in one molecular class, were downregulated in another class, and vice-versa.Additionally, some genes were found to be class specific.This difference might be due to differences in expression patterns of hormone receptors among the molecular classes.Significantly mutated subnetworks containing dysregulated, mutated and driver genes were also identified in different molecular classes of BRCA which gave insight into their biological functions.We believe that the dysregulated genes can serve as biomarkers for early detection of the class and stage of BRCA.Our approach seeks to maximize the use of datasets and techniques to understand the role of coding genes in disease pathogenesis, prognosis, and the development of diagnostic tools and therapeutics.We hope that identifying genes and their subnetworks will contribute to drug design and discovery.

Figure 1 .
Figure 1.Subnetwork for (a) Early stage ER/PR+/HER-2-.Seven dysregulated genes, coloured by the increasing mutation frequency (blue to red).NLGN3 and NRXN1 are frequently mutated while SYT13 is less mutated.The pathway analysis showed enrichment of TR/RXR activation pathway which activates PI3K pathway and regulates the biological process involved in cell survival and cell growth.(b) Early stage ER/PR-/ HER-2+ containing six dysregulated genes.The NOTCH2 and NOTCH4 are highly mutated.The subnetwork may be involved in Notch signaling pathway, Th1 and Th2 activation pathway and epithelial-mesenchymal transition pathway.(c) Early stage TNBC.Contains eight dysregulated genes among which TTN is highly mutated which is followed by SYNE1.Netrin signaling, RhoA signaling, and axonal guidance signaling pathways may be affected.(d) Late stage TNBC.Contains fourteen dysregulated genes.DMD2 and ANK2 are significantly enriched in mutations.The subnetwork may regulate galactose degradation I pathway, colanic acid building blocks biosynthesis and nNOS signaling pathway.

Figure 2 .
Figure 2. Mutational landscape of candidate genes showing deleterious functional impact in (a) 51 out of 60 samples, (b) 42 out of 59 samples (c) 91 out of 254 samples (d) 94 out of 164 samples.For each sample, the mutations per MB was calculated using number of deleterious mutations present.The mutation rate for a gene is based on deleterious missense mutation in number of samples.The mutation rate in NLGN3 is equal to BRCA1 (10 deleterious mutations).The mutation rate in TTN, SYNE1, ANK1 and ANK2 is greater than BRCA1 and BRCA2.

Figure 4 .
Figure 4. Real-Time PCR and western blot analysis of selected genes viz.NLGN3, MAML2, SYNE1 and ANK2. Figure shows relative mRNA expression level of a) NLGN3 b) MAML2 c) SYNE1 d) ANK2 in MCF10A, MCF7, MDA-MBA-231 and T47D cell lines.The GAPDH was used as control.Western blot illustrates the expression level of e) ANK2, f) SYNE1 and g) NLGN3 in different breast cancer cell lines (MDA-MB-231 and MCF7) and normal breast epithelial cell line (MCF10A).The blots were quantified by densitometry and subjected to statistical analysis and figures are shown in h), i) and j) respectively for ANK2, SYNE1 and NLGN3.Overall data are represented as mean ± SD.Student's t-test was used for the statistical analysis, n = 3 (***P ≤ 0.001, *P ≤ 0.05, Significant).Ns: no significant difference.

Figure 5 .
Figure 5. Figure showing the immunohistochemically stained TMA sections for a) NLGN3 b) SYNE1 and c) MAML2 in NAT, early stage TNBC, late stage TNBC, early stage ER/PR+/HER-2-and late stage ER/PR+/ HER-2-.d) NLGN3 expression is significantly decreased in early stage ER/PR+/HER-2-(p-value=0.0162) and early stage TNBC (p-value=0.0328).e) SYNE1 expression is significantly downregulated in early stage TNBC (pvalue=0.0410).f) Downregulated expression of MAML2 was found in all 4 categories but was not statistically significant.Right panel shows the immunohistochemically stained TMA sections for g) NLGN3 h) SYNE1 and i) MAML2 in NAT, grade 2 and grade 3 tumor.j) The expression of NLGN3 is significantly decreased in grade 2 (pvalue=0.0137)and grade 3 (p-value=0.0123)compared to NAT. k) There was a decrease in SYNE1 expression in grade 2 and 3 tumor samples but it was statistically insignificant.l) MAML2 expression is significantly decreased in grade 3 (p-value=0.017).Data are represented as mean ± SD.Student's t-test was used for the statistical analyses, n = 3 (*P < 0.05).

Figure 6 .
Figure 6.Overexpression of NLGN3 and ANK2 restricts breast cancer proliferation and migration.I. Cell proliferation quantified by MTT assay in control and NLGN3, ANK2 overexpressed MCF7 and MDA-MB-231 cells post 48hr.II.Overexpression of NLGN3 and ANK2 restricts breast cancer proliferation and migration.Colony forming assay showing number of colonies per plate along with graphical representation.III.Migration assay showing the number of migrated cells through trans-well chamber 48h after stimulation with 5% FBS.IV.Wound-healing showing 0 hr, 12 hr, and 24 hr post wounding and its graphical representation showing percentage wound closure in ctrl, ANK2O/E and NLGN3O/E-treated MCF7 and MDA-MB-231 cells.Data are represented as mean ± SD. 2 way ANOVA was used for the statistical analyses, n = 3 (****P < 0.0001,***P < 0.001, **P < 0.01, *P < 0.05).

Figure 7 .
Figure 7. a) The functions of LINC complex in normal cells.b) The proposed role of nesprin 1 (SYNE1) in progression of breast invasive carcinoma.

Figure 8 .
Figure 8.The flowchart of methodology.The Cancer RNA-seq Nexus (CRN) and International Cancer Genome Consortium (ICGC) were used as data sources as indicated in the figure.The differential expression analysis was performed on BRCA (breast invasive carcinoma) and NAT (Normal Adjacent Tissue) samples in three major molecular classes of BRCA in early and late stage.The genes were clustered into subnetworks based on their frequency of mutation.The pathway and functional enrichment analysis were performed on selected clusters of genes and selected genes were validated using qRT-PCR, western blot and immunohistochemistry. IPA=Ingenuity Pathway Analysis.qRT-PCR=Quantitative Real-Time Polymerase Chain Reaction.SNP=Single Nucleotide Polymorphism.

Table 2 .
The genes showed inverse expression pattern among different classes in early and late stage.The genes which showed upregulation in one class were found to be downregulated in other class and vice-versa.The table indicates the number of such genes.

Table 4 .
Summary of each component of the subnetwork.The table is showing heat score and fold change of each component.The status and mRNA expression of the components in expression atlas and Oncomine is also shown.* Log 2 FC>1 or Log 2 FC<-1, p-value < 0.05.

Table 5 .
Summary of each component of the subnetwork.The heat score value of each component is calculated based on the mutation frequency.The subnetwork is found to be involved in pathways e.g.Notch signaling pathway, Th1 and Th2 activation pathway and regulation of epithelial-mesenchymal transition pathway (EMT).* Log 2 FC>1 or Log 2 FC<-1, p-value < 0.05.

Table 7 .
Summary of each component of the subnetwork in late stage TNBC.The table also represents status and mRNA expression of each component in expression atlas and Oncomine.The subnetwork appears to be involved in pathways as galactose degradation I, colanic acid building blocks biosynthesis and nNOS signaling pathway.* Log 2 FC>1 or Log 2 FC<-1, p-value < 0.05.

Table 8 .
Mutational detail of the components of subnetwork showing total mutations (unique) across the four studies, TCGA; Cell 2015, TCGA; Nature 2012, TCGA; PanCancer Atlas and TCGA; Provisional using cbioportal.The cluster harbored several types of mutations as missense, fusion, splice, frame-shift, in-frame.The somatic mutation frequency of NLGN3 and NRXN1 was found to 1.0%.

Table 9 .
Mutational detail of the NOTCH gene and interactors representing total unique mutations across the four studies, TCGA; Cell 2015, TCGA; Nature 2012, TCGA; PanCancer Atlas and TCGA; Provisional.The components harbored several types of mutations as missense, fusion, splice, frame-shift, in-frame as shown in the table.

Table 10 .
Mutational detail of TTN gene and its interactors in early stage TNBC.The data represents the total unique mutations across the carcinoma samples across four studies (TCGA; Cell 2015, TCGA; Nature 2012, TCGA; PanCancer Atlas and TCGA; Provisional).The cluster harbored several types of mutations as missense, nonsense, fusion, splice, frame-shift, in-frame.The somatic mutation frequency of TTN was found to be 14.4% which is very high as compared to known genes (BRCA1 and BRCA2).Also, SYNE1 showed somatic mutation frequency as 4.4% which is even much greater than BRCA1 (1.7%) and BRCA2 (1.8%).

Table 11 .
Mutationalis involved in molecular functions such as protein kinase binding, ATPase binding and structural constituent of cytoskeletal.().

Table 12 .
Summary of the number of samples obtained for BRCA (breast invasive carcinoma) and NAT (normal adjacent tissue) in each stage of ER/PR+/HER-2−, ER/PR−/HER-2+ and TNBC (triple negative breast cancer).